Mapping Sites at Risk#

First we need to ensure we have all of the data from the preliminary analysis again.

Hide code cell source
#First recreate the data from the preliminary analysis
import geopandas as gpd
import pandas as pd

# Load and join GMCA housing, industrial and office supply data
housing_supply_gdf = gpd.read_file("data/gmca_data/2024 GM Housing Land Supply GIS.shp")
industrial_supply_gdf = gpd.read_file("data/gmca_data/2024 GM Industrial-warehousing Land Supply GIS.shp")
offices_supply_gdf = gpd.read_file("data/gmca_data/2024 GM Offices Land Supply GIS.shp")
total_supply_gdf = pd.concat(
    [housing_supply_gdf, industrial_supply_gdf, offices_supply_gdf]
)

# Load and tidy GMEU Sites of Biological Importance data
sbi_gdf = gpd.read_file("data/gmeu_data/gm_sbi.shp")
sbi_gdf["Category"] = "Site of Biological Importance"
sbi_gdf = sbi_gdf.rename(columns = {"district": "LAName", "site_nam": "SiteRef"})

# Join GMCA and GMEU data
full_data_gdf = pd.concat(
    [total_supply_gdf, sbi_gdf[["SiteRef", "LAName", "Category", "geometry"]]]
)

#Use geopandas to get centroids of all the sites
full_data_gdf["centroid"] = full_data_gdf.centroid
full_data_gdf["ref"] = range(len(full_data_gdf))

#Split into sites of biological importance and non-biological importance
sbi = full_data_gdf[full_data_gdf["Category"] == "Site of Biological Importance"]
non_sbi = full_data_gdf[full_data_gdf["Category"] != "Site of Biological Importance"]

#Find the number of new developments less than 1km away for each SBI
sbinames = list(sbi["SiteRef"]) #list of all the sbis
indexes = list(sbi["ref"])

distances = list()
less_than_1km = list() #creating empty lists to add to data frame

for x in sbi["centroid"]: #loop through each sbi
    y = non_sbi["centroid"].distance(x) #find all the distances of developments to centroid
    for distance in y: #filter for less than 1km away
            if distance <1000:
                distances.append(distance)
    r = len(distances)    #find no. developments less than 1km away to each sbi
    less_than_1km.append(r)
    distances = list()

Dev_1km = pd.DataFrame({'SiteRef':sbinames, 'No. Sites in 1km': less_than_1km, 'ref': indexes}) #create dataframe of sbi and no. developments     

#Add Scaled Risk Variable
bins = [-1, 0, 10, 20, 30, 40] #upper limit inclusive
labels = [0, 1, 2, 3, 4]
Dev_1km["Risk Scale"] = pd.cut(Dev_1km["No. Sites in 1km"], bins=bins, labels=labels)

Dev_1km
SiteRef No. Sites in 1km ref Risk Scale
0 Big Wood 0 4357 0
1 Winstanley Hall Woods 1 4358 1
2 Ackhurst Lane Sand Workings 3 4359 1
3 Abbey Lakes 6 4360 1
4 Wetland by M6 1 4361 1
... ... ... ... ...
531 Mill Race & Pasture at Haughton Dale 8 4888 1
532 Three Sisters 2 4889 1
533 Nan Nook Wood 7 4890 1
534 Big Wood 10 4891 1
535 Bank Wood & Marsh 11 4892 2

536 rows × 4 columns

Mapping scaled risk#

Following on from the preliminary analysis, our first task is to map the scaled risk category using the geopandas explore function. First, for reference, here is a map showing all 536 sites of biological importance.

#First here is a map of all the sites of biologcial importance.
sbi.explore("Category", cmap="tab10", tiles="CartoDB positron")
Make this Notebook Trusted to load map: File -> Trust Notebook
#Merge the original geo-locations dataframe with the dataframe with risk scores
risk_locations = pd.merge(sbi, Dev_1km, on="ref")

#Explore the risk scale with geopandas
risk_locations.explore("Risk Scale", cmap = "YlOrRd", tiles="CartoDB positron")
Make this Notebook Trusted to load map: File -> Trust Notebook

From this, it is immediately notable that the cale drops from the darkest reds very quickly, and that the scale is ultimately arbritrary. Therefore, we decided it would be useful to consider the data in terms of percentiles.

Exploring a percentile scale for risk#

Below, we calculated the 20th, 40th, 60th, 80th, 90th, 95th and 97.5th percentiles for the data. At first, we were going to consider the data without the 90th, 95th and 97.5 percentiles, but the 80th percentile is just 10. For reference, the largest value is 38, so clearly the most variation occurs at the highest values, so to choose a risk threshold, it is important to understand these higher percentiles more closely.

#Alternate way of calculating risk: percentiles
import numpy as np


twenty = int(np.percentile(Dev_1km["No. Sites in 1km"], 20 ))
forty = int(np.percentile(Dev_1km["No. Sites in 1km"], 40 ))
sixty = int(np.percentile(Dev_1km["No. Sites in 1km"], 60 ))
eighty = int(np.percentile(Dev_1km["No. Sites in 1km"], 80 ))
ninety = int(np.percentile(Dev_1km["No. Sites in 1km"], 90 ))
ninetyfive = int(np.percentile(Dev_1km["No. Sites in 1km"], 95 ))
ninetyseven = int(np.percentile(Dev_1km["No. Sites in 1km"], 97.5 ))
hundred = int(np.percentile(Dev_1km["No. Sites in 1km"], 100 ))

bins = [-1, twenty, forty, sixty, eighty, ninety, ninetyfive, ninetyseven, hundred ] #upper limit inclusive
labels = [20, 40, 60, 80, 90, 95, 97.5, 100]


print(bins)
[-1, 1, 4, 7, 10, 14, 17, 21, 38]

Considering these numbers, a better heuristic compared to the preliminary analysis would be for the “at risk” sites to be those with 15 or more development sites within 1km of the SBI as this encompasses approximately 10% of all SBIs.

This also takes into account the fact that our method of using distance to centroids between sites of development and SBI sites is very size dependent, for example, within 1km there could be less small developments but a few large developments that pose a major threat to an SBI site, so choosing a less sensitive heuristic takes into account this possibility for variation.

#Plotting percentile scale on a map

Dev_1km["Percentile Scale"] = pd.cut(Dev_1km["No. Sites in 1km"], bins=bins, labels=labels)

risk_locations = pd.merge(sbi, Dev_1km, on="ref")

risk_locations.explore("Percentile Scale", cmap = "YlOrRd", tiles="CartoDB positron")
Make this Notebook Trusted to load map: File -> Trust Notebook

This map shows the sites at a high level of risk much more clearly. Below, using the 15 sites threshold, here are the sites that can be considered “at risk”.

#Choose final risk as 90% threshold, >=15  and plot on map

Dev_1km = Dev_1km[Dev_1km["No. Sites in 1km"] >=15]

Dev_1km = pd.merge(sbi, Dev_1km, on="ref")

Dev_1km.explore(tiles="CartoDB positron")
Make this Notebook Trusted to load map: File -> Trust Notebook